LMM-Lasso: A Lasso Multi-Marker Mixed Model for Association Mapping with Population Structure Correction
نویسندگان
چکیده
Motivation: Exploring the genetic basis of heritable traits remains one of the central challenges in biomedical research. In traits with simple mendelian architectures, single polymorphic loci explain a significant fraction of the phenotypic variability. However, many traits of interest appear to be subject to multifactorial control by groups of genetic loci. Accurate detection of such multivariate associations is non-trivial and often compromised by limited power. At the same time, confounding influences such as population structure cause spurious association signals that result in false positive findings if they are not accounted for in the model. Results: We propose LMM-Lasso, a mixed model that allows for both multi-locus mapping and correction for confounding effects. Our approach is simple and free of tuning parameters, effectively controls for population structure and scales to genome-wide datasets. LMM-Lasso simultaneously discovers likely causal variants and allows for multi-marker based phenotype prediction from genotype. We demonstrate the practical use of LMM-Lasso in genome-wide association studies in Arabidopsis thaliana and linkage mapping in mouse, where our method achieves significantly more accurate phenotype prediction for 91% of the considered phenotypes. At the same time, our model dissects the phenotypic variability into components that result from individual SNP effects and population structure. Enrichment of known candidate genes suggests that the individual associations retrieved by LMM-Lasso are likely to be genuine. Availability: Code available under XXX. Contact: {rakitsch, clippert, stegle}@tuebingen.mpg.de
منابع مشابه
A Lasso multi-marker mixed model for association mapping with population structure correction
MOTIVATION Exploring the genetic basis of heritable traits remains one of the central challenges in biomedical research. In traits with simple Mendelian architectures, single polymorphic loci explain a significant fraction of the phenotypic variability. However, many traits of interest seem to be subject to multifactorial control by groups of genetic loci. Accurate detection of such multivariat...
متن کاملA Sparse Graph-Structured Lasso Mixed Model for Genetic Association with Confounding Correction
While linear mixed model (LMM) has shown a competitive performance in correcting spurious associations raised by population stratification, family structures, and cryptic relatedness, more challenges are still to be addressed regarding the complex structure of genotypic and phenotypic data. For example, geneticists have discovered that some clusters of phenotypes are more co-expressed than othe...
متن کاملFinding Sparse Features in Strongly Confounded Medical Binary Data
A typical task in statistical genetics is to find a sparse linear relation between genotypes with phenotypes, but often the data are confounded by age, ethnicity or population structure. We generalize the linear mixed model (LMM) Lasso approach for feature selection under confounding to the case of binary labels. This case is much more involved, as marginalization over the correlated noise lead...
متن کاملPenalized Lasso Methods in Health Data: application to trauma and influenza data of Kerman
Background: Two main issues that challenge model building are number of Events Per Variable and multicollinearity among exploratory variables. Our aim is to review statistical methods that tackle these issues with emphasize on penalized Lasso regression model. The present study aimed to explain problems of traditional regressions due to small sample size and m...
متن کاملTree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping
We consider the problem of learning a sparse multi-task regression with an application to a genetic association mapping problem for discovering genetic markers that influence expression levels of multiple genes jointly. In particular, we consider the case where the structure over the outputs can be represented as a tree with leaf nodes as outputs and internal nodes as clusters of the outputs at...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012